Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Kun-Lin Liu; Wu-Jun Li; Minyi Guo

doi:10.1609/aaai.v26i1.8353

Authors

Kun-Lin Liu Shanghai Jiao Tong University
Wu-Jun Li Shanghai Jiao Tong University
Minyi Guo Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v26i1.8353

Abstract

Twitter sentiment analysis (TSA) has become a hot research topic in recent years. The goal of this task is to discover the attitude or opinion of the tweets, which is typically formulated as a machine learning based text classification problem. Some methods use manually labeled data to train fully supervised models, while others use some noisy labels, such as emoticons and hashtags, for model training. In general, we can only get a limited number of training data for the fully supervised models because it is very labor-intensive and time-consuming to manually label the tweets. As for the models with noisy labels, it is hard for them to achieve satisfactory performance due to the noise in the labels although it is easy to get a large amount of data for training. Hence, the best strategy is to utilize both manually labeled data and noisy labeled data for training. However, how to seamlessly integrate these two different kinds of data into the same learning framework is still a challenge. In this paper, we present a novel model, called emoticon smoothed language model (ESLAM), to handle this challenge. The basic idea is to train a language model based on the manually labeled data, and then use the noisy emoticon data for smoothing. Experiments on real data sets demonstrate that ESLAM can effectively integrate both kinds of data to outperform those methods using only one of them.

Emoticon Smoothed Language Models for Twitter Sentiment Analysis

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information